11 research outputs found
WiCE: Real-World Entailment for Claims in Wikipedia
Textual entailment models are increasingly applied in settings like
fact-checking, presupposition verification in question answering, or summary
evaluation. However, these represent a significant domain shift from existing
entailment datasets, and models underperform as a result. We propose WiCE, a
new fine-grained textual entailment dataset built on natural claim and evidence
pairs extracted from Wikipedia. In addition to standard claim-level entailment,
WiCE provides entailment judgments over sub-sentence units of the claim, and a
minimal subset of evidence sentences that support each subclaim. To support
this, we propose an automatic claim decomposition strategy using GPT-3.5 which
we show is also effective at improving entailment models' performance on
multiple datasets at test time. Finally, we show that real claims in our
dataset involve challenging verification and retrieval problems that existing
models fail to address.Comment: EMNLP 202
Fair Abstractive Summarization of Diverse Perspectives
People from different social and demographic groups express diverse
perspectives and conflicting opinions on a broad set of topics such as product
reviews, healthcare, law, and politics. A fair summary should provide a
comprehensive coverage of diverse perspectives without underrepresenting
certain groups. However, current work in summarization metrics and Large
Language Models (LLMs) evaluation has not explored fair abstractive
summarization. In this paper, we systematically investigate fair abstractive
summarization for user-generated data. We first formally define fairness in
abstractive summarization as not underrepresenting perspectives of any groups
of people and propose four reference-free automatic metrics measuring the
differences between target and source perspectives. We evaluate five LLMs,
including three GPT models, Alpaca, and Claude, on six datasets collected from
social media, online reviews, and recorded transcripts. Experiments show that
both the model-generated and the human-written reference summaries suffer from
low fairness. We conduct a comprehensive analysis of the common factors
influencing fairness and propose three simple but effective methods to
alleviate unfair summarization. Our dataset and code are available at
https://github.com/psunlpgroup/FairSumm.Comment: 19 pages, 10 figure
Recommended from our members
Shortcomings of question answering based factuality frameworks for error localization
Despite recent progress in abstractive summarization, models often generate summaries with factual errors. Numerous approaches to detect these errors have been proposed, the most popular of which are question answering (QA)-based factuality metrics. These have been shown to work well at predicting summary-level factuality and have potential to localize errors within summaries, but this latter capability has not been systematically evaluated in past research. In this paper, we conduct the first such analysis and find that, contrary to our expectations, QA-based frameworks fail to correctly identify error spans in generated summaries and are outperformed by trivial exact match baselines. Our analysis reveals a major reason for such poor localization: questions generated by the QG module often inherit errors from non-factual summaries which are then propagated further into downstream modules. Moreover, even human-in-the-loop question generation cannot easily offset these problems. Our experiments conclusively show that there exist fundamental issues with localization using the QA framework which cannot be fixed solely by stronger QA and QG models.Computer Science
Lung adenocarcinoma and adrenocortical carcinoma in a patient with multiple endocrine neoplasia type 1
Multiple endocrine neoplasia type 1 (MEN1) is an autosomal dominant disorder caused by heterozygous germline mutations in the tumor suppressor gene MEN1, which encodes a nuclear protein, menin. MEN1 is characterized by the combined occurrence of tumors involving the pituitary gland, pancreatic islets, and parathyroid glands. Additionally, patients with MEN1 often exhibit adrenal tumors. Although most MEN1-associated tumors are benign, malignant lesions arising in these endocrine organs have been reported. Additionally, malignant diseases of non-endocrine organs concomitant with MEN1 have also been reported. Here, we report a rare case of a MEN1 patient who exhibited adrenocortical carcinoma (ACC) and lung adenocarcinoma (LAC).
A 53-year-old Japanese woman was diagnosed with genetically proven MEN1 that initially manifested as parathyroid, pancreatic, and adrenal tumors. During the course of the disease, she developed LAC harboring the epidermal growth factor receptor gene mutations and cortisol-secreting ACC. Both tumors were surgically resected. The tumor cells were immunohistochemically negative for menin.
Studies have suggested a causative link between MEN1 gene mutations and ACC, and menin expression may decrease in MEN1-related ACCs. In contrast, there are few reports suggesting a specific role of MEN1 gene mutations in LAC. Menin is often inactivated in the LACs of patients without MEN1. Thus, our patient's ACC probably occurred as part of MEN1, whereas the latter had no evident etiological association with her LAC. This case demonstrates the need for physicians to consider the potential development of malignant diseases originating from both endocrine and non-endocrine organs in MEN1 patients